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We show that many ideal observer models used to decode neural activity can be 
generalized to a conceptually and analytically simple form. This enables us to study the 
statistical properties of this class of ideal observer models in a unified manner. We 
consider in detail the problem of estimating the performance of this class of models. 
We formulate the problem de novo by deriving two equivalent expressions for the 
performance and introducing the corresponding estimators. We obtain a lower bound on 
the number of observations (N) required for the estimate of the model performance to 
lie within a specified confidence interval at a specified confidence level. We show that 
these estimators are unbiased and consistent, with variance approaching zero at the rate 
of 1 /N. We find that the maximum likelihood estimator for the model performance is not 
guaranteed to be the minimum variance estimator even for some simple parametric forms 
(e.g., exponential) of the underlying probability distributions. We discuss the application of 
these results for designing and interpreting neurophysiological experiments that employ 
specific instances of this ideal observer model. 

Keywords: ideal observer model, signal detection theory, neural decoding, receiver operating characteristic, 
maximum likelihood estimation 



INTRODUCTION 

Ideal observer models are an important tool in the effort to under- 
stand the neural bases of perception and behavior (FitzHugh, 
1957; Ratliff, 1962; De Valois et al., 1967; Ratliff et al, 1968; 
Talbot et al, 1968; Barlow and Levick, 1969; Barlow et al, 1971; 
Mountcastle et al., 1972; Johansson and Vallbo, 1979; Bradley 
et al, 1987; Newsome et al, 1989; Vogels and Orban, 1990; 
Geisler, 2001). Ideal observer analysis can be applied to the organ- 
ism as a whole, as in psychophysical studies, or to a specific stage 
of information processing within the visual system of the organ- 
ism, as is often done in neurophysiological studies (sometimes 
referred to as "sequential ideal observer analysis," see Geisler, 
1989). Here we focus exclusively on ideal observer models that 
arise in the analysis and interpretation of neurophysiological data. 
In this context, we define an ideal "observer" model as a set of 
operations and processes by which the experimenter optimally 
decodes stimuli, perceptual decisions, or behavioral outcomes 
from sensory neural activity (Green and Swets, 1966; Geisler, 
1989, 2001, 2004). In the early stages of a sensory system, such 
an ideal "observer" model can be used to study the efficiency of 
a neuron. For example, Barlow et al. (1971) used an ideal detec- 
tor model to compute detection probability from the number of 
photons absorbed by photoreceptors and related the results to 
retinal ganglion cell responses. In this manner, they were able to 
estimate the average number of impulses emitted by a retinal gan- 
glion cell per quantum of light absorbed by photoreceptors. They 
concluded ganglion cells are efficient and sensitive. In the inter- 
mediate stages of sensorimotor transformation, ideal observer 
models are often used to optimally decode behavioral choice 



related information from the responses of a single sensory neu- 
ron (Celebrini and Newsome, 1994; Britten et al, 1996). Such 
analyses associate neural responses with perceptual decisions (rev. 
Parker and Newsome, 1998). Ideal observer analysis can also be 
applied to optically imaged cortical signals to assess neural pop- 
ulation sensitivity for detection or discrimination (Chen et al., 
2006, 2008; Purushothaman et al., 2009; see also rev: Cohen et al., 
2011). 

The statistical properties of an ideal observer model impact the 
results. For example, an ideal observer typically yields an unbi- 
ased estimate of performance and increasing the number of trials 
will decrease the variance of this estimate. These assumptions are 
generally valid when the underlying probability distributions take 
certain parametric forms but deviations from these assumptions 
can influence the results. Furthermore, it is not always straight- 
forward to take into account confidence intervals for model per- 
formance in interpreting the results. Statistically valid methods 
of computing confidence intervals are known for some applica- 
tions (e.g., Agarwal et al., 2005; Sarma et al, 201 1) but this is not 
true in general. Therefore, heuristic or Monte-Carlo simulations 
are used to compute confidence intervals of ideal observer perfor- 
mance where necessary (e.g., Purushothaman et al, 2009). The 
main goal of this paper is to investigate the statistical properties 
and limitations of ideal observer models commonly used in the 
analyses of neurophysiological data. To achieve this goal, we first 
generalize four common forms of such ideal observer models. 

The first of these was used in studies of the absolute visual 
detection threshold (Hecht et al., 1942; Hartline et al, 1947; 
Ratliff, 1962). Hecht et al. (1942) showed that the probability 
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with which human observers detected flashes of light, that pre- 
sumably delivered a certain average number of quanta of energy 
(a) to the retina, closely followed the probability of drawing a 
"threshold" number of n or more quanta from a Poisson distribu- 
tion with mean arrival rate a. Analysis of the electrophysiological 
data of Hartline et al. (1947) from the Limulus eye showed that 
the frequency with which a neuron emitted at least a criterion 
number (Nc) of impulses also closely followed the probability 
of drawing Nc or more impulses from the Poisson distribution 
with arrival rate equal to a (Ratliff, 1962). Implicit in this analy- 
sis is the linking hypothesis that the neuron signals to the animal 
the presence of an external stimulus whenever the number of 
impulses emitted by the neuron is greater than or equal to Nc 
(Teller, 1984). Given this hypothesis, the ideal observer model 
estimates the maximum detection probability for a set of neural 
responses. It can be said that the criterion Nc is chosen in this 
model to fit detection probabilities but without regard to the false 
alarm rate. Since the "maintained" or "background" discharge 
rate of the neuron also fluctuates (Ratliff et al., 1968; Barlow and 
Levick, 1969), in some trials, the number of impulses emitted by 
the neuron will equal or exceed Nc simply due to this random 
fluctuation and the ideal observer will falsely signal the presence 
of a stimulus. This false alarm rate is not incorporated into this 
model. 

The second ideal observer model we consider takes the false 
alarm rate into account [e.g., Barlow and Levick, 1969; rev. 
Green and Swets, 1966]. Typically, the probability distribu- 
tion of the number of impulses in the maintained discharge 
is used to determine Nc so that the probability of false alarm 
is less than or equal to a predetermined value [e.g., 0.2% in 
Barlow and Levick (1969)]. The probability distribution for 
the stimulus-induced response will then determine the detec- 
tion rate for this criterion. The ideal observer in this analy- 
sis performs essentially the same operation as the one above, 
signaling the presence of a stimulus whenever the number of 
impulses emitted by the neuron exceeds Nc- But this crite- 
rion value is chosen based on a constraint on the false alarm 
rate. 

The third model arises in Two-Alternative Forced-Choice (2- 
AFC) paradigms employed in detection and discrimination stud- 
ies (Green and Swets, 1966). Typically, a reference and a test 
stimuli are presented either at two spatial locations (simulta- 
neously) or in two temporal intervals (sequentially). The task 
of the observer is to indicate the location or the interval in 
which the test stimulus occurred. Because decisions are based 
on the comparison of two stimuli or neural responses to two 
stimuli, there is no need in this case to set a fixed criterion 
level. For example, the ideal observer can consistently associate 
the larger response with the test stimulus (e.g., Barlow et al., 
1971). Computationally, the experimenter builds two histograms 
of neural responses, one each for the reference and test stim- 
uli. The correct detection or discrimination probability for the 
ideal observer in the 2-AFC task is then the average rate at 
which the observer can correctly identify which sample belongs 
to which distribution when presented with two random samples, 
one drawn from the reference distribution and the other from the 
test distribution (Green and Swets, 1966). This probability can 



be estimated as the area under the receiver operating character- 
istic (ROC) curve for the pair of histograms (Green and Swets, 
1966). 

The fourth model we consider is computationally similar to 
the third model but has an important conceptual difference in 
that it is used to predict the choices made by a subject in a 
2-AFC task based on the neural responses for near-threshold 
stimuli (Johansson and Vallbo, 1979; Celebrini and Newsome, 
1994; Britten et al., 1996). This analysis can be used to link 
subjective perceptual decisions to single neuron responses (rev. 
Parker and Newsome, 1998; Romo, 2001; see also Vallbo and 
Johansson, 1980). As a consequence, this ideal observer model 
has found wide application recently (Dodd et al, 2001; Cook and 
Maunsell, 2002; Romo et al., 2002; Williams et al., 2003; Stoet 
and Snyder, 2004; Uka and DeAngelis, 2004; Williams et al, 2004; 
Purushothaman and Bradley, 2005; Pessoa and Padmala, 2005; 
Gu et al, 2007, 2008; Cohen and Newsome, 2009; Bosking and 
Maunsell, 2011). 

The main difference between the first two ideal observer mod- 
els and the last two is that the latter models are presented with 
two observations instead of one, making it possible to render 
decisions based on a direct comparison of the given observa- 
tions, independent of a free parameter in the form of a con- 
stant criterion number. While this makes the two types of ideal 
observers different from functional point of view, it is pos- 
sible to have a single mathematical framework within which 
the performance of both types of models can be quantitatively 
described. Consider an ideal observer with two inputs rn and 
r\ and two outputs Co and Q. Let P(tq) and P(r\) be the 
probability distributions of the two input variables. In the fol- 
lowing, we show that with appropriate choices for Co, C\ and 
P(f*o), P(ri)> this ideal observer can be used for absolute sen- 
sory detection tasks (first two categories described above) as 
well as for 2-AFC tasks (last two categories). In this framework, 
the performance (i.e., true positive, false positive, true nega- 
tive, and false negative rates) of all four types of ideal observers 
can be described using the same closed-form expression. We 
then address the following questions: 1) How does the perfor- 
mance of the generalized ideal observer compare to the area 
under the ROC curve? 2) Is it possible to determine a priori 
the number of input samples required so that the estimated 
value of the observer's performance will lie within a specified 
confidence interval at a specified confidence level? 3) Are these 
estimates unbiased and consistent, i.e., does estimation error 
decrease with increasing number of observations and at what 
rate? 4) Do efficient (minimum variance) estimators exist for 
the performance of these ideal observers? 5) Is the standard 
method of estimating performance (area under the ROC curve) 
efficient? Answers to these questions will facilitate a more effi- 
cient design of neurophysiological experiments for ideal observer 
analysis. 

RESULTS 

GENERALIZED IDEAL OBSERVER EQUATIONS 

In the notation introduced above, consider an ideal observer 
model with inputs rrj and r\. Let So and Si be the two exper- 
imental conditions associated with ro and r\, respectively. The 
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probability distributions Po(ro) and Pi(ri) are given by the con- 
ditional distributions Po(ro) = Po(folSo) and Pi(ri) = Pi(ri|Si). 
The ideal observer, who has no a priori knowledge of which 
input sample comes from which condition, makes a prediction 
to that effect using a "decision rule". If the observer predicts 
that ro comes from the condition So (or, equivalently, from the 
distribution Po(ro|So)) and that ri comes from Si (i.e., from 
-PiClISi)), then the observer will be correct. The opposite asso- 
ciation will be incorrect. The variables ro and r\ may represent 
the frequency of impulses emitted by the neuron. Without loss 
of generality, assume that the values of ro and r\ lie within 
the upper right quadrant of the real plane, i.e., the sample 
space consists of all points r = (ro, r\) e SR + x SR+. The deci- 
sion region T> C Dt x SH + consists of all values of ro and r\ 
for which the ideal observer makes a correct prediction. Then 
the probability of correct prediction for this ideal observer is 
given by 



L 



p(r 0 , n)df, 



(1) 



where p(ro, r\) is the joint probability density function cor- 
responding to the joint probability distribution P(ro, r\). In 
many experiments, the responses to the two conditions are inde- 
pendent random variables. Hence P(ro, ri) = Po(ro|So)Pi(ri|Si). 
Furthermore, the optimal decision variable (e.g., the likelihood 
ratio) or its sufficient statistic, involve monotone functions of 
the two variables ro and r\ thereby resulting in a partition 
of the sample space !K+ x SK+ into a decision region of the 
form V = {(r 0 , n) e SK+ x SK+|ri > r 0 }. Substituting this inte- 
gration region into Equation (1) and choosing the summation 
of the elemental areas along the two possible directions yields 
two equivalent expressions for the performance of the ideal 
observer as 



P(po(ro),pi(n)) = ^ Pl(ri) {/ Po 



{r 0 )dr 0 \ dr 



and 



P(po(r 0 ),pi(ri)) = f poOo) { ( pi(n)dri\ dr 0 

J0 [Jr 0 

= l-£ Po [Pi] 



(2) 



(3) 



where £f( x ) [G(x)J = f G(x)f(x)dx denotes the expectation of the 
function G with respect to the probability density function/, and 
p\(r,-), i = 0, 1 are the marginal probability density functions. It is 
important to note that P(p 0 (ro), pi (n)) = 1 - P (piOl), po(/o)) 
and therefore the order of the two distribution in the argument of 
P(., .) cannot be exchanged. 

This general ideal observer gives rise to the four spe- 
cific ideal observers described above. In simple detection 
tasks, the two stimulus conditions are typically Si = 
"Stimulus present" and So = "Stimulus absent." Choose 
po(ro) = 8(ro — Nc) where h(x) is the Dirac delta function such 



that /f^ S(x) = 1 and 8(x) =0 Vx / 0. Then Equation (2) 
simplifies to 



P(poO-o),pi(n)) 

poo i pn 

= J o Pi(ri) \J P 0 (ro)dr 0 



dry 



p oo 

/ Pi 

Jo 



(n)\ I ' S(r 0 - N c )dr 0 + f ' 8(r 0 - N c )dr Q \ dr Y 
o Jn c 



p oo 

/ Piin)dri 
Jn c 



which is the probability Pi(ri > Nc), the hit rate in the detec- 
tion task. Thus, for the choice ofpo(ro) = 8(ro — Nc), the general 
ideal observer model simplifies to the first category of ideal 
observers that signal the presence of a stimulus whenever the 
response of the neuron under consideration equals or exceeds 
the fixed criterion number Nc- The second category of ideal 
observers used in detection tasks differs from the first only in 
the choice of the criterion number Nc- Therefore these mod- 
els can be derived using po(/o) = H r o — Nc) where Nc is now 
determined using the inequality f Nc p 0 (ro)dro < a. It is also 
clear that the general observer fully describes the third cate- 
gory of ideal observers used to quantify neural detection and 
discrimination performance in 2-AFC tasks. Finally, for the 
fourth category of ideal observers, the two "stimulus" condi- 
tions need to be replaced with the two "choices" available to 
the subject. Thus, this general ideal observer provides a com- 
plete description of the four types of ideal observers consid- 
ered above. We should note that this generalization does not 
imply that all four categories of ideal observers are functionally 
or physiologically equivalent. This generalization is just math- 
ematical and provides a unified framework for the following 
analyses. 

ESTIMATORS FOR THE PERFORMANCE OF THE IDEAL OBSERVER 

Suppose Pi = [rnrn . . . ri; . . . riiv] and Po = 

[roi roa . . . rofc . . . nw] are two sets of N samples each obtained 
in the experiment from the conditions Si and So, respectively. 
In the above notation, the elements of Pi are independent 
and identically distributed as Pi and those of Po are similarly 
drawn from Po. I[o, x ](y) is the indicator function of y on 
the closed interval [0, x] such that I[o, x ](y) = 1 iff 7 6 [0, x] 
and 0 otherwise. Then, based on Equation (2), an estima- 
tor of the performance of the generalized ideal observer as 
a function of the samples Po for given a value of rj; can be 
proposed as 



P(Poln,) = jzj^ho.n^rok)- 



This provides an estimate for the inner integral in Equation (2), 
given a value of r\. Using all 2N samples of both Pi and Po, 
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P can be estimated as 



N N 

P(R 0 , R l ) = -^J2J2ho,n l] (rok)- 



(4) 



:U-=1 



The estimator based on Equation (3) can be similarly 
obtained as 



N N 



k= 1 



Equation (4) provides one simple way to estimate the perfor- 
mance of the generalized ideal observer. We pick one sample 
from Ri, say m, and count the number of samples of Po that 
are less than or equal to n,. We repeat this for all samples in 
Pi and divide the result by N 2 . Equation (5) provides a sim- 
ilar method. Computationally, this sequence of operations can 
be rearranged to resemble the operations involved in comput- 
ing the area under the ROC curve for the normalized frequency 
histograms constructed from Pn and Pi. Thus, there are at least 
3 different methods to estimate the performance of this ideal 
observer. We show below that all three methods compute the 
area under the ROC curve, empirically constructed from Po 
and Pi. 

RELATIONSHIP TO THE AREA UNDER THE ROC CURVE 

For a fixed criterion T, the hit rate (f$) and false alarm rate (a) are 



roc 

PCD = J piin)dn 



and 



o(T) 



J p 0 (r 0 )dr 0 . 



(6) 



(7) 



Using Equation (6), we can rewrite the expression for the perfor- 
mance of the ideal observer in Equation (3) as 

P(po(r 0 ),pi(n)) = jf Po(ro)j^ piOi)A-ij dr 0 
po(ro)$(ro)dr Q 



f 

Jo 

f 

Jo 



P(r 0 )^ 0 (ro) 



Using Equation (7) in the above, we have the performance of the 
ideal observer as 



P(po(ro),py(n)) = J P(r 0 )dP 0 (r 0 ) = J 



fida. (8) 



Since the ROC curve is the plot of f5 against a as the criterion 
varies from 0 to co, the quantity f fida is the area under the 
ROC curve (Figure 1A). Therefore, estimates of the quantities in 




Probability of false alarm (a) 




Difference between mean values 



-2 

0.4 



o Equation (4) 
+ Equation (5) 



0.6 0.8 
Area under the ROC curve 



o Equation {4) 
+ Equation {5) 



0.6 0.7 0.8 0.9 
Area under the ROC curve 




0.4 0.6 0.8 

Area under the ROC curve 



Area under the ROC curve 



FIGURE 1 | Relationship of the derived estimators to the area under 
the ROC curve. (A) Area under the ROC curve j fida has the equivalent 
definitions given Equations (2) and (3) and admits the estimators given in 
Equations (4) and (5). (B) Single estimates of the area under the ROC curve 
and Equations (4) and (5) are shown comparatively for a progressively 
increasing difference in the mean firing rates for Gaussian distributions. The 
points are predominantly coincident. (C) The percent error for single 
estimates lies within 2%. (D) When 100 such trial estimates are averaged 
together, the percent error falls close to 0%. These differences in the 
estimates are not systematic and are entirely due to numerical errors. (E,F) 
Same as (CD) but for Poisson distributions. In this case the errors 
decrease monotonically from about 5 to close to 0%. 



Equations (2) and (3) are also estimators of the area under the 
ROC curve. 

Figures 1, 2 numerically illustrate the fact that estimators (4) 
and (5) are equivalent to the conventional estimate of perfor- 
mance as the area under the ROC curve. For Figure 1, we assumed 
Gaussian distributions for Pq and Pi with the mean of Pi greater 
than that for Pq. A random set of 100 samples were drawn from 
each distribution and the area under the ROC curve was esti- 
mated. Performance was also estimated using Equations (4) and 
(5). The difference between the mean values of Gaussian distri- 
butions was then increased in the range [0.5, 25] in steps of 0.5. 
The variances were set to 1.28 x mean 12 to mimic the firing rate 
statistics of MT neurons (Britten et al., 1992; Purushothaman 
and Bradley, 2005). The estimates were computed for this entire 
range of mean values (Figure IB). The deviation of the estima- 
tors (4) and (5) from the area under the ROC curve (computed in 
the traditional manner), was evaluated as Percent error = 100 x 
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A 10- 

| 0.9 

ea 
E 



10 15 20 

Ratio of variances 



Equation (4) 
Equation (5)" 



0.5 0.6 0.7 0.8 0.9 1 
Area under the ROC curve 



Equation (4) 
Equation (5)|- 



0.5 0.6 0.7 0.8 0.9 1 
Area under the ROC curve 



FIGURE 2 | Derived estimators and area under ROC curve as a function 
of variances. (A) Single estimates of the area under the ROC curve and 
Equations (4) and (5) are compared for progressively increasing ratio of 
variances for Gaussian distributions. The points are predominantly 
coincident. (B) The percent error for single estimates lies within 2%. (C) 
When 100 such trial estimates are averaged together, the percent error falls 
close to 0%. 



(Area under ROC — estimate) / estimate. The three estimates dif- 
fered by less than 1.5% from each other (Figure 1C). When 
the estimates were averaged over 100 repetitions, the errors 
became negligible (Figure ID). Simulations with Poisson distri- 
butions showed errors in the range of 0 — 5% (Figures 1E,F). 
For Figure 2, we again assumed Gaussian distributions for Pn 
and Pi with the mean of Pi greater than that for Pn. However, 
in these simulations, the difference between the mean values 
were held constant while the ratio of the variance of Pi to 
that of Po was increased in the range [1, 25]. The percent error 
was computed as above. These simulations also showed that 
the estimates averaged over 100 repetitions had negligible error 
(Figure 2C). 



note that P(Pn, Pi) is a joint transformation of the independent 
random variables r 0 jt and r l!? i, k = 1, 2, . . . , N and that r^, k = 

1,2 N are identically distributed for each i. Therefore 

the expected value of the estimator in Equation (4) can be 
computed as 



£[P] 



poo poo poo 
Jo JO Jo 



k=l 



P(Po,Pi) 

N 



Y[ Po(r 0 k)dr ok 1 I Y\ M r li)*i 



i= l 



(9) 



Substitutin 
we get 



£[P(P 0 , Pi)] 



tg Equation (4) into the above equation, 



1 

N2 



EE ( *™ 



= 1 fc= 1 ' 



1 

N 1 



EE f Pi( r i>)(/ Po(rok)dr Q k)dru 



P(Po, Pi)- 



Therefore, P in Equation (4) is an unbiased estimator of P. 
Similarly, it can be shown that the estimator of Equation (5) is 
also unbiased. 

VARIANCE OF THE ESTIMATOR 

The variances of the estimators in Equations (4) and (5) can be 
computed by first subtracting P from both sides of Equation (4) 
and squaring them : 



^(p-p^fEEtw™)-^ 

=EE(/i.,- 1 ,i(,,)-p) : + EEE E 



N N N N 



i= 1 k= 1 



;=ljfc=l/=l m=l 

ftra) /ft k) 



(ho. r u ](r ok ) - P) (l[0, ru ](r 0m ) - P) 

= Sl +S 2 



(10) 



UNBIASED ESTIMATION OF THE IDEAL OBSERVER PERFORMANCE 

It is easy to verify that the estimators given in Equations (4) 
and (5) are unbiased, i.e., their expected values are equal to the 
true value to be estimated (Van Trees, 1966, pp. 65-73). We 



Expanding the summand of Si as (7[o, ni \ (rok) — P) = 
PlO, r H ] (r ak ) + P 2 - 2 P 7 [0 , rii ] (r ok ) and noting that £[7 [0 , rii ] 
(r ok )] = P, we obtain for the expectation of the first term, 
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E[Si] = N 2 P(l — P). Next, we rewrite the second sum as 



N N N 



S2 = E E E (ho. rtffo*) - p) (ho, nflim) - p) 



;=i j=i k= i 



N N N 



+ E E E ruiim) - P) (ko, r u] (r 0m ) ~ P) 



/= 1 fc= 1 m=l 



N N N N 



+ E E E E - *) (%>, rxdfrom) " P) 



-.lk=l 1=1 rn=\ 



= S21 + S22 + S23 



(ID 



Consider the first sum on the right side of Equation (11) above. 
We compute the expectation of the product i[n, ni ] ( r ok)ho, nj] ( r ok) 
for; 5^ i as 



'[^[0,r li ](''ofc)%, ry]0"0/t)l 
poo poo 

= / Piindpiinj) 

Jo Jo 

\ 

ko. r«] (r ok )I[o, riji (r ok )p 0 (r ok )dr ok I dr u dr Y j 

poo poo 

/ / pi(rii)pi(ry) 
Jo Jo 



/■min(r lj , n ; ) \ 

J Po(ro k )dr ok \ drudrij. 



(12) 



Since min(r l! , ry) < ru, we get the following bound: 
E [ho. ni](rok)ho, nj](rok)\ 

noo <>oo / i-r-u \ 

J J P^dP^jHJ Po(rok)dr ok J drudnj 
= f o pi(h;) Po(rok)drok\ dm = P. (13) 



Therefore, we have for the expectation of the first term on the 
right side of Equation (11) the bound E[S 2 i] < N 2 (N - 1)[P(1 - 



P)]. Now consider the second sum. The expectation of the prod- 
uct 7 [0 . rij ] (r 0 i)I[o, rii] (ram) iorm^k is given by 



E [ho. m](rok)ho, n;](To«)] 

poo / p 00 

= J pi(rii)(J ho,r u ]( r ok)po(ro k )dr ok 
i^j ho,r li ]( r Om)po(rom)dr Qm ^ dr u 

roo / rru 

= J pi(n»)N (rok)po(rok)dr ok 



(ro m )po(r 0m )dr 0m I dr x 



j Pi(rii)^ {rok)Po(rok)dr ok j &r\ 



(14) 



(15) 



where we used the bound f^ 1 ' (rom)po(rom)dro m < 1 in Equation 
(14). Therefore, we have for the expectation of S22 the bound 
£[S 22 ] < N 2 (N - 1)[P(1 - P)]. Finally, we note that in the last 
term S23, the summand (l [0 , ni ] (rok) ~ P) (ho, nd ( r o m ) - P) is the 
product of two independent and zero-mean random variables for 
(j, A:) Q, ni). Hence the variance of the estimator in Equation 
(4) has the bound 



e(p - py < P(i - p) 
= p(i - p) 



1 2(N - 1) 
N 2 + N 2 



2N- 1 



N 2 



Similar calculations yield the same bound for the variance of the 
estimator in Equation (5). 

CONSISTENCY OF THE ESTIMATOR 

Next, we verify if the estimators are consistent, i.e., if the estimates 
progressively converge to the true value as the number of observa- 
tions is increased (Van Trees, 1966, pp. 65-73). To do so, we first 
apply the Tchebycheff-Bienayme inequality to P. For any e > 0, 
we have 



Prob{|P-P| > e} < 



Var(P) 



P(l - P) 



2N- 1 

N 2 



(16) 



Thus P converges to P in probability as N —> 00 and is a consis- 
tent estimator of P. 

DEVIATION OF AN ESTIMATE FROM THE TRUE VALUE 

The above analyses showed that the proposed estimators give an 
unbiased estimate of the performance of the ideal observer and 
that as the number of observations increases, the error of esti- 
mation (i.e., the variance of the estimator) decreases at the rate 
of l/N. In addition to establishing these properties, the above 
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analyses also give us tools for designing the ideal observer model. 
Suppose the experiment has been performed and an estimate of 
the performance of the ideal observer has been obtained for a 
neuron. It is desirable to determine the likelihood that the true 
value of the performance lies within a known range of the esti- 
mate obtained, i.e., we would like to state a confidence interval 
for the estimate at a given significance level. Currently, this confi- 
dence interval, when reported, is obtained using bootstrapping or 
other empirical methods. The above analyses provides a tool for 
quantifying the deviation of a performance estimate from its true 
value in a simpler and more rigorous manner. Equation (16) can 
be used for this purpose. Suppose we require the percent error 
in the estimate, 100 x \P — P\/P, to be less than 5%. This gives 
e = 0.05 x P, from which the probability that the true value lies 
outside this error range can be computed as 



Prob 



\P-P\ 
100 x ~ > 5% 



Prob[|P-P| > 0.05 xP] 



P(l ~ P) 
(0.05 x P) 2 



2N- 1 

N 2 



(17) 



Thus, the quantity a 



A P(l - P) fiN^i 



J gives the significance 



(0.05 xP) 2 L N 2 

level for the desired confidence interval. We note that since \P — 
P\ > e, a does not necessarily depend upon the unknown P. For 
largeN,2N>> 1. Hence a % 2P(1 - P)/N (0.05 xP) 2 . 



We investigated the tightness of this bound using a series of 
simulations (Figure 3). We simulated N trials by drawing N sam- 
ples of i?o and Ri, each, from Gaussian distributions whose mean 
values differed by progressively increasing amounts so that the 
true value of the ideal observer performance varied from 0.5 to 
1.0. For each set (Rq,Ri), we obtained one estimate of P. We 
performed this simulation 1000 times and computed the maxi- 
mum deviation of the estimate from the true value, the average 
deviation and the minimum deviation for the 1000 estimates. 
We repeated all of these simulations for Gamma distributions. 
The results are shown superimposed on the corresponding values 
of e for a values of 0.01 and 0.05 (Figure 3). The same pattern 
of results were obtained for Poisson and scaled Poisson distri- 
butions. These simulations show that for small values of N(< 
100) and ct(= 0.01), the actual difference between the true and 
estimated values is much smaller than the theoretical bound €. 
At a = 0.05 and for higher values of N, the theoretical devia- 
tion approaches the maximum empirical deviation obtained in 
the simulations. The implications of the varying tightness of the 
theoretical bound for experimental design are discussed below. 

DESIGNING EXPERIMENTS FOR RELIABLE ESTIMATION OF IDEAL 
OBSERVER PERFORMANCE 

Some previous studies have empirically investigated the num- 
ber of trials required to obtain a reliable estimate of the ideal 
observer's performance. For example, Britten et al. (1996) com- 
puted "choice probability" separately for odd and even numbered 




Performance of the ideal observer 



FIGURE 3 | Tightness of the bound in Equation (17). Results are shown for 
Gaussian (top row) and Gamma (bottom row) distributions. The difference 
between the mean values were progressively increased so that the true value 
of the ideal observer performance varied from 0.5 to 1.0. This performance is 
plotted on the X-axis. The performance was estimated 1000 times and the 



maximum deviation of the estimate from the true value, the average deviation, 
and the minimum deviation were computed. The corresponding values of e are 
also plotted on all the graphs. The effect of varying (a = 0.01 and 0.05) for a fixed 
(N = 100) is shown in the left and middle columns. The effect of varying (W = 100 
and 250) for a fixed (a = 0.05) is shown in the middle and right columns. 
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trials. This allowed them to compute a measure of the random 
dispersion of the probability values. One goal of that investiga- 
tion was to test whether or not the population average choice 
probability was significantly different from chance. For the pop- 
ulation average choice probability of 0.55, at least 100 trials were 
required for the odd and even estimates to differ by less than 0.05 
(i.e., 0.55-0.5). A different empirical approach was required to 
estimate the number of trials required to significantly reduce esti- 
mation errors in the ROC analysis of optically imaged intrinsic 
signals (Purushothaman et al., 2009). 

From the results obtained in the previous section, we can arrive 
at a general formula for systematically determining the number 
trials required for the estimate of the performance of the gener- 
alized ideal observer to reach a desired confidence interval. From 
Equation (16) above, we have, Vg > 0, 



Table 1 | The confidence interval (e) and the number of trials (/V) are 
shown for various true values of P. 



Prob{|P-P| > £} < 



P(l - P) 



2N- 1 

N 1 



(18) 



First, as an example, we consider the Britten et al. (1996) study. 
Assume that the true value of choice probability in that study 
was 0.55. Suppose we require that the estimate should lie within 
±0.05 of the true value at an alpha (or significance) level of 0.05, 
i.e., we require P{|P — P| > 0.05} < 0.05 so that, in concordance 
with the empirical test performed by Britten et al. (1996), the dis- 
persion in the choice probability estimate reliably excludes the 
chance value of 0.5. Then the number of trials N should be at 
least 2^0.55(1 - 0.55)/(0.05 2 x 0.05) « 89. The empirical test 
by Britten et al. (1996) yielded N ~ 100, quite close to this value. 
However, the above formula also allows us to determine N at 
other significance levels. At a significance level of 0.01, we get 
N > 198. 

While many studies that followed Britten et al. (1996) have 
used this "100 trials" rule to determine N, our analysis shows that 
fewer trials suffice when higher values are expected for the per- 
formance of the ideal observer. For example, multistable percepts 
are linked to fluctuations in neural activity quite strongly (Dodd 
et al., 2002) and neurons in higher brain areas also show a strong 
link between their activity and perceptual decisions (Shadlen and 
Newsome, 2001). Using Table 1 and Equation (18), it is pos- 
sible to estimate the required value of N during experimental 
design. It is also possible to estimate confidence intervals (i.e., e) 
for a given value of N during data analysis without resorting to 
numerical simulations. Table 1 provides a look-up of e and N 
for various values of P. As mentioned above, our simulations 
showed that at a given value of N and a, the actual deviation 
between the true and estimated values was much smaller than 
the theoretical bound set at t (Figure 1). Therefore, the values of 
N shown in Table 1 are likely to be overestimates, i.e., fewer tri- 
als might suffice to reach the desired confidence interval in some 
cases. 

EFFICIENT ESTIMATORS OF IDEAL OBSERVER PERFORMANCE 
MAY NOT EXIST 

Since the performance of the ideal observer can be estimated in 
more than one way, it is natural to ask if some of these methods 



p 


6 


a = 0.01 


N 


a = 0.05 


0.525 


10% of P 


96 




43 


0.550 


10% of P 


91 




41 


0.575 


10% of P 


86 




38 


0.6 


10% of P 


82 




37 


0.525 


5% of P 


191 




85 


0.550 


5% of P 


181 




81 


0.575 


5% of P 


172 




77 


0.6 


5% of P 


164 




73 


0.625 


10% of P 


78 




35 


0.650 


10% of P 


73 




33 


0.675 


10% of P 


70 




31 


0.7 


10% of I 


66 




29 


0.625 


5% of P 


155 




70 


0.650 


5% of P 


146 




65 


0.675 


5% of P 


139 




63 


0.7 


5% of P 


131 




59 



The expression for obtaining the number of trials required to reach a given confi- 
dence interval sata significance level a is N r*s J . Alternatively, for given 

values N and a, the confidence interval can be computed as s « W ?p ~ P J ■ 



are "better" than others. In addition to requiring that estima- 
tors be unbiased and consistent, it is also required that estimators 
should be "efficient" when possible (Van Trees, 1966, pp. 66- 
73). An efficient estimator has the minimum possible variance 
among all unbiased estimators for a quantity and therefore will 
yield the lowest possible error for a given number of observations, 
on average. Under some conditions, maximum likelihood (ML) 
estimators are minimum variance estimators. Therefore, it is nat- 
ural to seek for ML estimators for the performance of the ideal 
observer model. In this section, we first show that P(Rq, Ri) is 
"efficient" in a limited sense. We then present a counter-example 
to show that the maximum-likelihood (ML) estimator for the per- 
formance of an ideal observer is not guaranteed to be minimum 
variance. 

We will first describe a limited sense in which P is efficient. 
Let M(R 0 , R0 = EiLiEjfcLi WidCn*) so that P(R 0 , Ri) = 
M(Ro, R\)/N 2 . Then, for a given value of P, the probability 
distribution function for M is simply the binomial distribution 



P M (M(R 0 , R0 = m\P) 



N 2 



(P) m (l-P) 



N 2 -n 



(19) 



Therefore it can be verified that the calculation 



3 logP M (m|P) I 



9P 

gives the ML estimator as 

Pml(m) 



P = P m ,(m) ■ 



m 

N 2 ' 



(20) 
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We note also that 

d\ogP M (m\P) 
dP 



(P ml (m) - P) 



N 2 
P(l ~ P) 

N 2 
P(l-P) 



i.e., P m \(m) satisfies the sufficient condition to be an efficient esti- 
mator (Van Trees, 1966, pp. 66-73). In addition, EM(P m l(m)) = 
P. Therefore, P m /(m) = m/N 2 is an unbiased and efficient esti- 
mator of P. However, it is important to note that P m /(m) is an 
estimator of P as a function of the transformed random variable 
M(Rq, R\) and not as a function of Pn and R\. The following 
counter-example shows that it is not possible to guarantee that 
ML estimators of P are minimum variance. 

Let the two conditional distributions be exponential, with 
Po( r o) = aoexp(— aoro) andpi(ri) = ai exp(— airi). We can cal- 
culate P for this case using Equation (2) as P(Pq, Pi) = ao/(ai + 
an). Let us note that 

1. if cti = an, thenP = 0.5, 

2. P —y 1 as an —y oo for a given ai < oo (i.e., as the mass in the 
tail of the distribution Pn accumulates while that of Pi remains 
constant), and 

3. P — y 0 as an — y 0 for a given ai < oo. 

Thus the conditional density of the observed variables for a given 
value of P can be written as 



p(r 0 , n\P) = a 0 ai exp 
which gives 



P l-P 
aij— pr 0 +oto— p— n 



31ogp(r 0 , n\P) a 0 n 



air 0 



dP 



P 2 (1 - P) 2 



(21) 



Equating the right hand side to 0, we obtain the ML estimator for 
P in this case as 



Pmliro, n) 



1 



1 + V(air 0 /a 0 ri) 
We now note that equation (21) cannot be put in the form 
31og£(r 0) ri|P) 



9P 



T(P)[P m /(r 0 , n)-P], 



where T(P) is a function of P alone. Therefore, the sufficient con- 
dition for P m ;(ro, r\) to be efficient is not satisfied (e.g., Van Trees, 
1966, pp. 66-73). Further, it is also clear that P(rn, r\) is a biased 
estimator. Hence the ML estimator of P for this case cannot be 
guaranteed to be minimum variance. 

DISCUSSION 

We proposed a general form of an ideal observer for decod- 
ing stimulus information and perceptual decisions from neural 
responses. We showed that several ideal observer models used 
in previous studies are special cases of this general form. We 
investigated the statistical properties of this general ideal observer 
model. These analyses provide various tools for designing experi- 
ments with the goal of using an ideal observer analysis on neural 
data. We have provided a lower bound on the number of obser- 
vations required for the estimate to lie within a pre-specified 
range of its true value ("confidence interval"), within a specified 
confidence level. 

We also showed that there is not a uniformly "best" (i.e., 
minimum variance) estimator for the performance of the ideal 
observer since the existence of such an estimator depends on the 
parametric forms of the underlying probability distributions. It is 
sometimes argued that computing the area under the ROC curve 
offers a non-parametric way of estimating ideal observer perfor- 
mance. While it is true that this estimation procedure does not 
depend on the parametric forms of the underlying probability 
distributions, it is important to note that the resulting estimate 
will be invariably influenced by the underlying parametric forms. 
Therefore, for some parametric forms and under some condi- 
tions, neither the estimators provided in Equations (4) and (5) 
nor the area under the ROC curve will be efficient. However, 
regardless of which estimator is chosen, the relationship between 
the number of trials, the confidence interval and the confidence 
level derived in this paper can be used to design the experiment 
and validate the results. 

It is worth noting that the number of trials required for the 
estimate to lie within a confidence interval at a given confidence 
level is not the optimum number of trials required for reaching the 
decision. Therefore in certain applications other methods, such as 
sequential probablity ratio tests, may be more appropriate (Wald, 
1945). 
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